first we are going to import the necessary liberaries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from plot_assist import *
df = pd.read_csv('data/201902-fordgobike-tripdata.csv')
df.head()
| duration_sec | start_time | end_time | start_station_id | start_station_name | start_station_latitude | start_station_longitude | end_station_id | end_station_name | end_station_latitude | end_station_longitude | bike_id | user_type | member_birth_year | member_gender | bike_share_for_all_trip | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 52185 | 2019-02-28 17:32:10.1450 | 2019-03-01 08:01:55.9750 | 21.0 | Montgomery St BART Station (Market St at 2nd St) | 37.789625 | -122.400811 | 13.0 | Commercial St at Montgomery St | 37.794231 | -122.402923 | 4902 | Customer | 1984.0 | Male | No |
| 1 | 42521 | 2019-02-28 18:53:21.7890 | 2019-03-01 06:42:03.0560 | 23.0 | The Embarcadero at Steuart St | 37.791464 | -122.391034 | 81.0 | Berry St at 4th St | 37.775880 | -122.393170 | 2535 | Customer | NaN | NaN | No |
| 2 | 61854 | 2019-02-28 12:13:13.2180 | 2019-03-01 05:24:08.1460 | 86.0 | Market St at Dolores St | 37.769305 | -122.426826 | 3.0 | Powell St BART Station (Market St at 4th St) | 37.786375 | -122.404904 | 5905 | Customer | 1972.0 | Male | No |
| 3 | 36490 | 2019-02-28 17:54:26.0100 | 2019-03-01 04:02:36.8420 | 375.0 | Grove St at Masonic Ave | 37.774836 | -122.446546 | 70.0 | Central Ave at Fell St | 37.773311 | -122.444293 | 6638 | Subscriber | 1989.0 | Other | No |
| 4 | 1585 | 2019-02-28 23:54:18.5490 | 2019-03-01 00:20:44.0740 | 7.0 | Frank H Ogawa Plaza | 37.804562 | -122.271738 | 222.0 | 10th Ave at E 15th St | 37.792714 | -122.248780 | 4898 | Subscriber | 1974.0 | Male | Yes |
df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 183412 entries, 0 to 183411 Data columns (total 16 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 duration_sec 183412 non-null int64 1 start_time 183412 non-null object 2 end_time 183412 non-null object 3 start_station_id 183215 non-null float64 4 start_station_name 183215 non-null object 5 start_station_latitude 183412 non-null float64 6 start_station_longitude 183412 non-null float64 7 end_station_id 183215 non-null float64 8 end_station_name 183215 non-null object 9 end_station_latitude 183412 non-null float64 10 end_station_longitude 183412 non-null float64 11 bike_id 183412 non-null int64 12 user_type 183412 non-null object 13 member_birth_year 175147 non-null float64 14 member_gender 175147 non-null object 15 bike_share_for_all_trip 183412 non-null object dtypes: float64(7), int64(2), object(7) memory usage: 22.4+ MB
descrip_column(df['duration_sec'])
name: duration_sec dtype: int64 null count: 0 unique: [52185 42521 61854 ... 13251 5713 2822] unique count: 4752 max range: 85444 min range: 61 max frequncy: 311 min frequncy: 1
descrip_column(df['start_time'])
name: start_time dtype: object null count: 0 unique: ['2019-02-28 17:32:10.1450' '2019-02-28 18:53:21.7890' '2019-02-28 12:13:13.2180' ... '2019-02-01 00:06:05.5490' '2019-02-01 00:05:34.3600' '2019-02-01 00:00:20.6360'] unique count: 183401 max frequncy: 2 min frequncy: 1
this variable type is quantitative continuous with seconds as threshold but it can be discrete by segregating it into years, monthes, days and hours , neglecting minutes and seconds as it will not really affect our analysis. will impliment that in the Structuring step.
this column has no missing values
descrip_column(df['end_time'])
name: end_time dtype: object null count: 0 unique: ['2019-03-01 08:01:55.9750' '2019-03-01 06:42:03.0560' '2019-03-01 05:24:08.1460' ... '2019-02-01 00:08:27.2200' '2019-02-01 00:07:54.2870' '2019-02-01 00:04:52.0580'] unique count: 183397 max frequncy: 2 min frequncy: 1
this variable type is quantitative continuous with seconds as threshold but it can be discrete by segregating it into years, monthes, days and hours , neglecting minutes and seconds as it will not really affect our analysis. will impliment that in the Structuring step.
this column has no missing values
descrip_column(df['start_station_id'])
name: start_station_id dtype: float64 null count: 197 unique: [ 21. 23. 86. 375. 7. 93. 300. 10. 19. 370. 44. 127. 252. 243. 349. 131. 74. 321. 180. 72. 163. 190. 6. 78. 258. 238. 28. 109. 98. 133. 113. 220. 122. 58. 87. 15. 104. 27. 9. 140. 310. 53. 106. 340. 121. 11. 240. 61. 36. 34. 13. 345. 43. 239. 182. 119. 369. 159. 254. 30. 356. 324. 71. 67. 250. 245. 377. 317. 219. 274. 77. 129. 253. 386. 95. 183. 5. 137. 73. 176. 197. 136. 33. 59. 115. 280. 262. 368. 385. 90. 112. 160. 4. 247. 97. 308. 75. 123. 172. 114. 244. 8. 55. 31. 62. 125. 49. 194. 263. 120. 371. 107. 144. 70. 47. 148. 383. 17. 281. 66. 76. 338. 92. 336. 155. 235. 339. 323. 311. 141. 171. 350. 166. 223. 312. 380. 110. 181. 79. 16. 39. 266. 246. 14. 88. 3. 154. 215. 126. 149. 89. 102. 294. 22. 202. 198. 96. 256. 248. 60. 230. 277. 85. 80. 134. 105. 296. 285. 158. 304. 81. 50. 269. 268. 150. 195. 249. 130. 99. nan 270. 101. 355. 52. 343. 116. 305. 341. 365. 241. 174. 276. 381. 207. 193. 118. 51. 91. 363. 46. 186. 108. 156. 211. 20. 372. 196. 139. 151. 357. 18. 42. 64. 201. 284. 146. 147. 203. 56. 138. 279. 170. 237. 259. 398. 179. 373. 24. 167. 29. 26. 100. 41. 327. 145. 54. 291. 273. 25. 124. 315. 267. 214. 251. 232. 191. 275. 35. 309. 255. 187. 242. 337. 132. 278. 175. 178. 63. 292. 364. 152. 316. 210. 318. 216. 282. 162. 313. 298. 295. 164. 142. 351. 218. 257. 358. 233. 173. 153. 307. 192. 287. 157. 314. 265. 169. 378. 361. 344. 360. 290. 288. 84. 297. 213. 212. 283. 362. 200. 286. 359. 177. 299. 206. 205. 306. 231. 227. 236. 388. 222. 168. 204. 189. 188. 271. 221. 389. 217. 228. 225. 303. 209. 289. 229. 301. 226. 234. 224. 37.] unique count: 329 max range: 398.0 min range: 3.0 max frequncy: 3904 min frequncy: 2
float which is not appropriate for describing ID. we should convert it to intdescrip_column(df['end_station_id'])
name: end_station_id dtype: float64 null count: 197 unique: [ 13. 81. 3. 70. 222. 323. 312. 127. 121. 43. 343. 244. 252. 60. 71. 336. 75. 180. 107. 221. 52. 269. 189. 196. 15. 78. 263. 50. 73. 373. 133. 115. 96. 145. 122. 195. 19. 284. 93. 34. 58. 132. 294. 377. 109. 216. 349. 11. 62. 138. 240. 118. 10. 53. 266. 197. 163. 369. 159. 64. 386. 80. 259. 9. 86. 296. 219. 258. 134. 141. 305. 144. 267. 137. 77. 209. 98. 67. 21. 119. 192. 360. 253. 4. 27. 247. 310. 139. 147. 243. 241. 88. 230. 61. 398. 297. 371. 106. 210. 136. 277. 95. 56. 129. 200. 16. 85. 72. 239. 92. 36. 356. 193. 182. 123. 160. 49. 321. 17. 25. 6. 388. 126. 368. 113. 378. 186. 74. 157. 104. 175. 223. 362. 238. 357. 55. 248. 63. 381. 5. 30. 350. 158. 245. 250. 365. 385. 47. 28. 29. 375. 345. 44. 66. 341. 364. 153. 120. 286. 339. 130. 90. 79. 203. 268. 285. 340. 26. 254. 156. 370. 262. 311. 84. 23. 249. 33. 246. 152. 290. 151. 190. 101. 89. 220. 304. 324. 233. 169. 114. 87. 255. 108. 194. 380. nan 112. 76. 110. 201. 215. 278. 281. 125. 22. 140. 204. 59. 39. 97. 116. 54. 174. 176. 372. 148. 256. 99. 124. 168. 280. 232. 242. 166. 202. 142. 274. 51. 355. 14. 211. 363. 183. 155. 41. 170. 228. 8. 105. 91. 207. 315. 20. 188. 205. 171. 257. 167. 251. 235. 327. 212. 295. 289. 276. 102. 214. 150. 146. 24. 217. 42. 131. 7. 173. 383. 178. 100. 181. 275. 279. 231. 149. 35. 31. 291. 164. 317. 179. 338. 154. 187. 307. 18. 318. 351. 270. 191. 225. 299. 213. 218. 308. 316. 273. 265. 389. 292. 162. 337. 271. 172. 236. 198. 46. 309. 283. 361. 314. 206. 227. 300. 306. 358. 282. 287. 313. 229. 237. 177. 359. 288. 303. 298. 234. 226. 344. 224. 301. 37.] unique count: 329 max range: 398.0 min range: 3.0 max frequncy: 4857 min frequncy: 5
end_station_id this column has 197 missing values.float which is not appropriate for describing ID. we should convert it to intlen(symmetric_difference(df['start_station_id'].dropna(),df['end_station_id'].dropna()))
0
descrip_column(df['start_station_name'])
name: start_station_name dtype: object null count: 197 unique: ['Montgomery St BART Station (Market St at 2nd St)' 'The Embarcadero at Steuart St' 'Market St at Dolores St' 'Grove St at Masonic Ave' 'Frank H Ogawa Plaza' '4th St at Mission Bay Blvd S' 'Palm St at Willow St' 'Washington St at Kearny St' 'Post St at Kearny St' 'Jones St at Post St' 'Civic Center/UN Plaza BART Station (Market St at McAllister St)' 'Valencia St at 21st St' 'Channing Way at Shattuck Ave' 'Bancroft Way at College Ave' 'Howard St at Mary St' '22nd St at Dolores St' 'Laguna St at Hayes St' '5th St at Folsom' 'Telegraph Ave at 23rd St' 'Page St at Scott St' 'Lake Merritt BART Station' 'West St at 40th St' 'The Embarcadero at Sansome St' 'Folsom St at 9th St' 'University Ave at Oxford St' 'MLK Jr Way at University Ave' 'The Embarcadero at Bryant St' '17th St at Valencia St' 'Valencia St at 16th St' 'Valencia St at 22nd St' 'Franklin Square' 'San Pablo Ave at MLK Jr Way' '19th St at Mission St' 'Market St at 10th St' 'Folsom St at 13th St' 'San Francisco Ferry Building (Harry Bridges Plaza)' '4th St at 16th St' 'Beale St at Harrison St' 'Broadway at Battery St' 'Cesar Chavez St at Dolores St' 'San Fernando St at 4th St' 'Grove St at Divisadero' 'Sanchez St at 17th St' 'Harmon St at Adeline St' 'Mission Playground' 'Davis St at Jackson St' 'Haste St at Telegraph Ave' 'Howard St at 8th St' 'Folsom St at 3rd St' 'Father Alfred E Boeddeker Park' 'Commercial St at Montgomery St' 'Hubbell St at 16th St' 'San Francisco Public Library (Grove St at Hyde St)' 'Bancroft Way at Telegraph Ave' '19th Street BART Station' '18th St at Noe St' 'Hyde St at Post St' '24th St at Market St' 'Vine St at Shattuck Ave' 'San Francisco Caltrain (Townsend St at 4th St)' 'Valencia St at Clinton Park' 'Union Square (Powell St at Post St)' 'Broderick St at Oak St' 'San Francisco Caltrain Station 2 (Townsend St at 4th St)' 'North Berkeley BART Station' 'Downtown Berkeley BART' 'Fell St at Stanyan St' 'San Salvador St at 9th St' 'Marston Campbell Park' 'Oregon St at Adeline St' '11th St at Natoma St' 'Harrison St at 20th St' 'Haste St at College Ave' '24th St at Bartlett St' 'Sanchez St at 15th St' 'Telegraph Ave at 19th St' 'Powell St BART Station (Market St at 5th St)' 'Jersey St at Castro St' 'Pierce St at Haight St' 'MacArthur BART Station' 'El Embarcadero at Grand Ave' '23rd St at San Bruno Ave' 'Golden Gate Ave at Hyde St' 'S Van Ness Ave at Market St' 'Jackson Playground' 'San Fernando St at 7th St' 'West St at University Ave' 'Myrtle St at Polk St' 'Woolsey St at Sacramento St' 'Townsend St at 7th St' 'Harrison St at 17th St' 'West Oakland BART Station' 'Cyril Magnin St at Ellis St' 'Fulton St at Bancroft Way' '14th St at Mission St' 'San Pedro Square' 'Market St at Franklin St' 'Folsom St at 19th St' 'College Ave at Taft Ave' 'Rhode Island St at 17th St' 'Shattuck Ave at Hearst Ave' 'The Embarcadero at Vallejo St' 'Webster St at Grove St' 'Raymond Kimbell Playground' 'Victoria Manalo Draves Park' '20th St at Bryant St' 'S Park St at 3rd St' 'Lakeshore Ave at Trestle Glen Rd' 'Channing Way at San Pablo Ave' 'Mission Dolores Park' 'Lombard St at Columbus Ave' '17th St at Dolores St' 'Precita Park' 'Central Ave at Fell St' '4th St at Harrison St' 'Horton St at 40th St' 'Golden Gate Ave at Franklin St' 'Embarcadero BART Station (Beale St at Market St)' '9th St at San Fernando St' '3rd St at Townsend St' 'McCoppin St at Valencia St' '13th St at Franklin St' 'Mission Bay Kids Park' 'Potrero Ave and Mariposa St' 'Emeryville Public Market' 'Union St at 10th St' 'Jackson St at 11th St' 'Broadway at Kearny' 'Paseo De San Antonio at 2nd St' 'Valencia St at Cesar Chavez St' 'Rockridge BART Station' '8th St at Brannan St' 'College Ave at Alcatraz Ave' '16th St Mission BART Station 2' 'San Jose Diridon Station' 'Masonic Ave at Turk St' '17th & Folsom Street Park (17th St at Folsom St)' 'Grand Ave at Webster St' '7th St at Brannan St' 'Steuart St at Market St' 'Scott St at Golden Gate Ave' 'Parker St at Fulton St' 'Berkeley Civic Center' 'Clay St at Battery St' '11th St at Bryant St' 'Powell St BART Station (Market St at 4th St)' 'Doyle St at 59th St' '34th St at Telegraph Ave' 'Esprit Park' 'Emeryville Town Hall' 'Division St at Potrero Ave' 'Irwin St at 8th St' 'Pierce Ave at Market St' 'Howard St at Beale St' 'Washington St at 8th St' 'Snow Park' 'Dolores St at 15th St' 'Hearst Ave at Euclid Ave' 'Telegraph Ave at Ashby Ave' '8th St at Ringold St' '14th St at Mandela Pkwy' 'Morrison Ave at Julian St' 'Church St at Duboce Ave' 'Townsend St at 5th St' 'Valencia St at 24th St' '16th St at Prosper St' '5th St at Virginia St' "Webster St at O'Farrell St" 'Shattuck Ave at Telegraph Ave' 'Jackson St at 5th St' 'Berry St at 4th St' '2nd St at Townsend St' 'Telegraph Ave at Carleton St' 'Ellsworth St at Russell St' 'Adeline St at 40th St' 'Bay Pl at Vernon St' 'Russell St at College Ave' '22nd St Caltrain Station' 'Folsom St at 15th St' nan 'Ninth St at Heinz Ave' '15th St at Potrero Ave' '23rd St at Tennessee St' 'McAllister St at Baker St' 'Bryant St at 2nd St' 'Mississippi St at 17th St' 'Ryland Park' 'Fountain Alley at S 2nd St' 'Turk St at Fillmore St' 'Ashby BART Station' 'Shattuck Ave at 51st St' 'Julian St at The Alameda' '20th St at Dolores St' 'Broadway at Coronado Ave' 'Grand Ave at Santa Clara Ave' 'Eureka Valley Recreation Center' 'Parker Ave at McAllister St' 'Berry St at King St' 'Salesforce Transit Center (Natoma St at 2nd St)' 'San Antonio Park' 'Lakeside Dr at 14th St' '16th St Mission BART' 'Stanford Ave at Hollis St' 'Broadway at 40th St' 'Mechanics Monument Plaza (Market St at Bush St)' 'Madison St at 17th St' 'Grand Ave at Perkins St' 'Garfield Square (25th St at Harrison St)' '53rd St at Hollis St' '2nd St at Julian St' 'Telegraph Ave at Alcatraz Ave' 'San Francisco City Hall (Polk St at Grove St)' '5th St at Brannan St' '10th St at Fallon St' 'Yerba Buena Center for the Arts (Howard St at 3rd St)' '30th St at San Jose Ave' '29th St at Tiffany Ave' 'Webster St at 2nd St' 'Koshland Park' 'Jersey St at Church St' 'Santa Clara St at 7th St' 'Telegraph Ave at 58th St' 'Fruitvale BART Station' 'Addison St at Fourth St' 'Leavenworth St at Broadway' 'Telegraph Ave at 27th St' 'Potrero del Sol Park (25th St at Utah St)' 'Spear St at Folsom St' 'College Ave at Harwood Ave' "O'Farrell St at Divisadero St" '1st St at Folsom St' 'Bryant St at 15th St' 'Golden Gate Ave at Polk St' '5th St at San Salvador St' '29th St at Church St' 'Alamo Square (Steiner St at Fulton St)' 'Autumn Parkway at Coleman Ave' 'Fulton St at Ashby Ave' 'Howard St at 2nd St' '19th St at Florida St' 'Market St at 45th St' 'Derby St at College Ave' 'Market St at Brockhurst St' 'California St at University Ave' 'MLK Jr Way at 14th St' 'Market St at 40th St' 'Julian St at 6th St' 'Cahill Park' 'San Jose City Hall' 'Virginia St at Shattuck Ave' 'Jack London Square' 'Milvia St at Derby St' 'Webster St at 19th St' '24th St at Chattanooga St' 'The Alameda at Bush St' '49th St at Telegraph Ave' 'Broadway at 30th St' 'Bryant St at 6th St' 'Empire St at 1st St' 'China Basin St at 3rd St' '47th St at San Pablo Ave' 'San Salvador St at 1st St' '45th St at Manila' 'San Carlos St at Market St' 'San Pablo Ave at 27th St' 'Market St at Park St' 'Franklin St at 9th St' 'Almaden Blvd at San Fernando St' 'Oak St at 1st St' 'William St at 10th St' 'Isabella St at San Pablo Ave' 'Guerrero Park' '10th St at University Ave' 'DeFremery Park' 'Fifth St at Delaware St' 'Williams Ave at 3rd St' '4th Ave at E 12th St (Temporary Location)' 'Shattuck Ave at 55th St' '59th St at Horton St' 'SAP Center' '37th St at West St' 'Almaden Blvd at Balbach St' '65th St at Hollis St' 'Santa Clara St at Almaden Blvd' 'Ninth St at Parker St' 'Bushrod Park' 'Empire St at 7th St' 'Mendell St at Fairfax Ave' '16th St Depot' 'Newhall St at 3rd St' 'George St at 1st St' 'Mission St at 1st St' 'Duboce Park' 'Locust St at Grant St' '32nd St at Adeline St' 'Mosswood Park' 'Delmas Ave and San Fernando St' 'Lane St at Revere Ave' '2nd Ave at E 18th St' 'San Carlos St at 11th St' 'Williams Ave at Apollo St' 'MacArthur Blvd at Telegraph Ave' 'Bestor Art Park' 'College Ave at Bryant Ave' 'Miles Ave at Cavour St' 'Saint James Park' '14th St at Filbert St' 'Foothill Blvd at Fruitvale Ave' 'Market St at 8th St' 'Backesto Park (Jackson St at 13th St)' '10th Ave at E 15th St' 'Alcatraz Ave at Shattuck Ave' '55th St at Telegraph Ave' 'Genoa St at 55th St' 'Dover St at 57th St' 'San Pablo Park' '6th Ave at E 12th St (Temporary Location)' 'Taylor St at 9th St' '27th St at MLK Jr Way' 'Foothill Blvd at Harrington Ave' '23rd Ave at Foothill Blvd' 'San Pedro St at Hedding St' '45th St at MLK Jr Way' '5th St at Taylor St' 'Foothill Blvd at 42nd Ave' 'Willow St at Vine St' '26th Ave at International Blvd' 'Farnam St at Fruitvale Ave' '21st Ave at International Blvd' '2nd St at Folsom St'] unique count: 329 max frequncy: 3904 min frequncy: 2
descrip_column(df['end_station_name'])
name: end_station_name dtype: object null count: 197 unique: ['Commercial St at Montgomery St' 'Berry St at 4th St' 'Powell St BART Station (Market St at 4th St)' 'Central Ave at Fell St' '10th Ave at E 15th St' 'Broadway at Kearny' 'San Jose Diridon Station' 'Valencia St at 21st St' 'Mission Playground' 'San Francisco Public Library (Grove St at Hyde St)' 'Bryant St at 2nd St' 'Shattuck Ave at Hearst Ave' 'Channing Way at Shattuck Ave' '8th St at Ringold St' 'Broderick St at Oak St' 'Potrero Ave and Mariposa St' 'Market St at Franklin St' 'Telegraph Ave at 23rd St' '17th St at Dolores St' '6th Ave at E 12th St (Temporary Location)' 'McAllister St at Baker St' 'Telegraph Ave at Carleton St' 'Genoa St at 55th St' 'Grand Ave at Perkins St' 'San Francisco Ferry Building (Harry Bridges Plaza)' 'Folsom St at 9th St' 'Channing Way at San Pablo Ave' '2nd St at Townsend St' 'Pierce St at Haight St' 'Potrero del Sol Park (25th St at Utah St)' 'Valencia St at 22nd St' 'Jackson Playground' 'Dolores St at 15th St' '29th St at Church St' '19th St at Mission St' 'Bay Pl at Vernon St' 'Post St at Kearny St' 'Yerba Buena Center for the Arts (Howard St at 3rd St)' '4th St at Mission Bay Blvd S' 'Father Alfred E Boeddeker Park' 'Market St at 10th St' '24th St at Chattanooga St' 'Pierce Ave at Market St' 'Fell St at Stanyan St' '17th St at Valencia St' 'San Pablo Ave at 27th St' 'Howard St at Mary St' 'Davis St at Jackson St' 'Victoria Manalo Draves Park' 'Jersey St at Church St' 'Haste St at Telegraph Ave' 'Eureka Valley Recreation Center' 'Washington St at Kearny St' 'Grove St at Divisadero' 'Parker St at Fulton St' 'El Embarcadero at Grand Ave' 'Lake Merritt BART Station' 'Hyde St at Post St' '24th St at Market St' '5th St at Brannan St' '24th St at Bartlett St' 'Townsend St at 5th St' 'Addison St at Fourth St' 'Broadway at Battery St' 'Market St at Dolores St' '5th St at Virginia St' 'Marston Campbell Park' 'University Ave at Oxford St' 'Valencia St at 24th St' 'Valencia St at Cesar Chavez St' 'Ryland Park' 'Precita Park' 'Derby St at College Ave' 'Jersey St at Castro St' '11th St at Natoma St' '45th St at MLK Jr Way' 'Valencia St at 16th St' 'San Francisco Caltrain Station 2 (Townsend St at 4th St)' 'Montgomery St BART Station (Market St at 2nd St)' '18th St at Noe St' '37th St at West St' 'Newhall St at 3rd St' 'Haste St at College Ave' 'Cyril Magnin St at Ellis St' 'Beale St at Harrison St' 'Fulton St at Bancroft Way' 'San Fernando St at 4th St' 'Garfield Square (25th St at Harrison St)' '29th St at Tiffany Ave' 'Bancroft Way at College Ave' 'Ashby BART Station' '11th St at Bryant St' '14th St at Mandela Pkwy' 'Howard St at 8th St' 'Leavenworth St at Broadway' 'Locust St at Grant St' 'Lombard St at Columbus Ave' 'Sanchez St at 17th St' '45th St at Manila' '23rd St at San Bruno Ave' 'Morrison Ave at Julian St' 'Sanchez St at 15th St' 'Koshland Park' 'Harrison St at 20th St' '2nd Ave at E 18th St' 'Steuart St at Market St' 'Church St at Duboce Ave' 'Page St at Scott St' 'Bancroft Way at Telegraph Ave' 'Mission Bay Kids Park' 'Folsom St at 3rd St' 'Valencia St at Clinton Park' 'Grand Ave at Santa Clara Ave' '19th Street BART Station' 'Folsom St at 19th St' 'West Oakland BART Station' 'S Park St at 3rd St' '5th St at Folsom' 'Embarcadero BART Station (Beale St at Market St)' 'Howard St at 2nd St' 'The Embarcadero at Sansome St' 'Backesto Park (Jackson St at 13th St)' 'Esprit Park' 'Myrtle St at Polk St' 'Franklin Square' 'Empire St at 7th St' 'Lakeside Dr at 14th St' 'Laguna St at Hayes St' '65th St at Hollis St' '4th St at 16th St' '49th St at Telegraph Ave' '16th St Mission BART Station 2' 'Lane St at Revere Ave' 'MLK Jr Way at University Ave' '2nd St at Julian St' 'Webster St at Grove St' 'Telegraph Ave at Ashby Ave' 'Bryant St at 6th St' '20th St at Dolores St' 'Powell St BART Station (Market St at 5th St)' 'San Francisco Caltrain (Townsend St at 4th St)' '8th St at Brannan St' 'Shattuck Ave at Telegraph Ave' 'Downtown Berkeley BART' 'North Berkeley BART Station' 'Turk St at Fillmore St' 'Woolsey St at Sacramento St' '4th St at Harrison St' 'The Embarcadero at Bryant St' "O'Farrell St at Divisadero St" 'Grove St at Masonic Ave' 'Hubbell St at 16th St' 'Civic Center/UN Plaza BART Station (Market St at McAllister St)' '3rd St at Townsend St' 'Fountain Alley at S 2nd St' 'China Basin St at 3rd St' '59th St at Horton St' 'Mission Dolores Park' 'San Carlos St at 11th St' 'Jackson St at 11th St' '22nd St Caltrain Station' 'Townsend St at 7th St' '7th St at Brannan St' 'Webster St at 2nd St' 'Ellsworth St at Russell St' "Webster St at O'Farrell St" 'Harmon St at Adeline St' '1st St at Folsom St' 'Vine St at Shattuck Ave' 'Stanford Ave at Hollis St' 'Jones St at Post St' 'West St at University Ave' 'Paseo De San Antonio at 2nd St' 'Duboce Park' 'The Embarcadero at Steuart St' 'Russell St at College Ave' 'Golden Gate Ave at Hyde St' 'Berkeley Civic Center' '47th St at San Pablo Ave' 'George St at 1st St' '53rd St at Hollis St' 'West St at 40th St' '15th St at Potrero Ave' 'Division St at Potrero Ave' 'San Pablo Ave at MLK Jr Way' 'Jackson St at 5th St' 'Union Square (Powell St at Post St)' '4th Ave at E 12th St (Temporary Location)' 'Bushrod Park' 'Rhode Island St at 17th St' 'Folsom St at 13th St' 'Virginia St at Shattuck Ave' '16th St Mission BART' 'Lakeshore Ave at Trestle Glen Rd' 'Masonic Ave at Turk St' nan 'Harrison St at 17th St' 'McCoppin St at Valencia St' '17th & Folsom Street Park (17th St at Folsom St)' '10th St at Fallon St' '34th St at Telegraph Ave' 'The Alameda at Bush St' '9th St at San Fernando St' '20th St at Bryant St' 'Howard St at Beale St' 'Cesar Chavez St at Dolores St' '55th St at Telegraph Ave' 'S Van Ness Ave at Market St' 'Scott St at Golden Gate Ave' '14th St at Mission St' 'Mississippi St at 17th St' 'Alamo Square (Steiner St at Fulton St)' 'Shattuck Ave at 51st St' 'MacArthur BART Station' 'Madison St at 17th St' 'Horton St at 40th St' 'Hearst Ave at Euclid Ave' 'Folsom St at 15th St' '19th St at Florida St' 'Alcatraz Ave at Shattuck Ave' 'San Fernando St at 7th St' 'MLK Jr Way at 14th St' 'Milvia St at Derby St' 'College Ave at Alcatraz Ave' 'Washington St at 8th St' 'Guerrero Park' 'Oregon St at Adeline St' 'Parker Ave at McAllister St' '23rd St at Tennessee St' 'Clay St at Battery St' 'Broadway at 40th St' 'Salesforce Transit Center (Natoma St at 2nd St)' 'Telegraph Ave at 19th St' 'Emeryville Public Market' 'Golden Gate Ave at Polk St' 'Telegraph Ave at 58th St' 'Foothill Blvd at Harrington Ave' 'The Embarcadero at Vallejo St' '16th St at Prosper St' 'Berry St at King St' 'Broadway at Coronado Ave' 'Market St at 45th St' 'Mechanics Monument Plaza (Market St at Bush St)' 'Dover St at 57th St' 'Miles Ave at Cavour St' 'Rockridge BART Station' 'Fifth St at Delaware St' 'College Ave at Harwood Ave' 'California St at University Ave' 'Union St at 10th St' '5th St at San Salvador St' 'Mosswood Park' 'William St at 10th St' '5th St at Taylor St' 'Julian St at The Alameda' 'Irwin St at 8th St' 'Market St at Brockhurst St' 'Adeline St at 40th St' '30th St at San Jose Ave' 'Spear St at Folsom St' '27th St at MLK Jr Way' 'San Francisco City Hall (Polk St at Grove St)' '22nd St at Dolores St' 'Frank H Ogawa Plaza' 'Shattuck Ave at 55th St' 'Golden Gate Ave at Franklin St' 'Broadway at 30th St' 'Bryant St at 15th St' 'Grand Ave at Webster St' 'Julian St at 6th St' 'Santa Clara St at 7th St' '14th St at Filbert St' 'Emeryville Town Hall' 'Cahill Park' 'Raymond Kimbell Playground' 'Autumn Parkway at Coleman Ave' 'Isabella St at San Pablo Ave' 'San Salvador St at 9th St' 'Telegraph Ave at 27th St' '13th St at Franklin St' 'Doyle St at 59th St' 'Jack London Square' 'SAP Center' 'Telegraph Ave at Alcatraz Ave' 'San Carlos St at Market St' '10th St at University Ave' 'Ninth St at Heinz Ave' 'Market St at 40th St' '23rd Ave at Foothill Blvd' 'Bestor Art Park' '32nd St at Adeline St' 'DeFremery Park' 'San Pedro Square' 'San Salvador St at 1st St' 'Fulton St at Ashby Ave' 'Ninth St at Parker St' 'Taylor St at 9th St' 'Empire St at 1st St' 'Franklin St at 9th St' 'Webster St at 19th St' 'San Pablo Park' 'College Ave at Taft Ave' 'Market St at 8th St' 'Snow Park' 'San Antonio Park' 'San Jose City Hall' 'Delmas Ave and San Fernando St' 'Mendell St at Fairfax Ave' 'Santa Clara St at Almaden Blvd' 'College Ave at Bryant Ave' 'Foothill Blvd at Fruitvale Ave' 'Palm St at Willow St' 'Saint James Park' 'Williams Ave at 3rd St' 'Market St at Park St' 'Almaden Blvd at Balbach St' 'Almaden Blvd at San Fernando St' 'Foothill Blvd at 42nd Ave' 'Fruitvale BART Station' 'MacArthur Blvd at Telegraph Ave' 'Williams Ave at Apollo St' 'Mission St at 1st St' 'San Pedro St at Hedding St' 'Oak St at 1st St' 'Farnam St at Fruitvale Ave' '26th Ave at International Blvd' '16th St Depot' '21st Ave at International Blvd' 'Willow St at Vine St' '2nd St at Folsom St'] unique count: 329 max frequncy: 4857 min frequncy: 5
end_station_name this column has 197 missing values.len(symmetric_difference(df['start_station_name'].dropna(),df['end_station_name'].dropna()))
0
descrip_column(df['start_station_latitude'])
name: start_station_latitude dtype: float64 null count: 0 unique: [37.7896254 37.791464 37.7693053 37.77483629 37.80456235 37.7704074 37.3172979 37.79539294 37.788975 37.78732677 37.7810737 37.7567083 37.8658466 37.8693603 37.78100972 37.75500026 37.77643482 37.7801457 37.8126783 37.772406 37.7973195 37.8302232 37.80477 37.7737172 37.8723555 37.8717192 37.78716801 37.7633158 37.765052 37.7552126 37.764555 37.8113514 37.760299 37.776619 37.769757 37.795392 37.76704458 37.7880593 37.79857211 37.7478584 37.335885 37.775946 37.7632417 37.849735 37.7592103 37.79728 37.8660431 37.7765126 37.78383 37.7839879 37.794231 37.7664827 37.7787677 37.8688126 37.8090126 37.7610471 37.78734902 37.8160598 37.88022245 37.776598 37.76918818 37.78829998 37.7730627 37.7766392 37.873558 37.870139 37.77191688 37.333955 37.8098236 37.8575672 37.7735069 37.758862 37.86641794 37.75210498 37.7662185 37.8087021 37.78389936 37.750506 37.7717933 37.82840997 37.8088479 37.7544356 37.7816495 37.774814 37.7650259 37.33712237 37.86996671 37.78543383 37.8505777 37.771058 37.7638471 37.8053183 37.78588063 37.8677892 37.7682646 37.336802 37.77379321 37.7605936 37.8417999 37.7644783 37.87367621 37.799953 37.7770527 37.78381271 37.77779057 37.7592005 37.7807601 37.8110807 37.8628271 37.7614205 37.80274615 37.7630152 37.7472996 37.77331088 37.7809546 37.8297046 37.78078713 37.792251 37.3383952 37.77874161 37.77166246 37.80318908 37.77230063 37.76328094 37.84052117 37.8072393 37.80000163 37.79801364 37.333798 37.7479981 37.84427875 37.77143136 37.8513755 37.76476522 37.329732 37.77904666 37.7637085 37.8113768 37.7734919 37.79413 37.7789994 37.8624644 37.8690599 37.795001 37.7700298 37.78637527 37.8419238 37.8225475 37.7616343 37.8312752 37.76921786 37.7668828 37.327581 37.789756 37.8007544 37.80781318 37.7662102 37.87511169 37.8559558 37.7745204 37.8107432 37.3336577 37.7700831 37.77523487 37.7524278 37.764285 37.3259984 37.78352084 37.8332786 37.34875869 37.77588 37.780526 37.8623199 37.85749021 37.8312769 37.81231409 37.8584732 37.75728841 37.7670373 37.4 37.8534894 37.7670785 37.75536713 37.7774157 37.78317199 37.7648022 37.342725 37.3361883 37.78045006 37.8524766 37.8368013 37.3322326 37.75823842 37.8357883 37.8127441 37.7591769 37.77610091 37.77176211 37.7874921 37.79013985 37.8013189 37.76471009 37.8384435 37.8277573 37.7913 37.80403678 37.80889393 37.7510171 37.8361823 37.34113204 37.85022187 37.77865 37.7767539 37.7976728 37.78487208 37.7423139 37.7440667 37.79519476 37.77341397 37.7509004 37.3391456 37.8444927 37.7752321 37.866249 37.79647069 37.81607312 37.75179165 37.7896767 37.848152 37.7824046 37.78729 37.7671004 37.78127 37.33203868 37.7436839 37.77754677 37.3413348 37.85557366 37.78752178 37.7604469 37.834174 37.8618037 37.8233214 37.87055533 37.8061628 37.8305452 37.3429973 37.32911867 37.337391 37.87657255 37.796248 37.8601246 37.80696976 37.7518194 37.3319323 37.8359455 37.8193814 37.77591022 37.3448821 37.7719996 37.8356322 37.330165 37.83329352 37.330698 37.8178269 37.3324263 37.8005161 37.331415 37.32212463 37.3327938 37.81498823 37.7457388 37.86906048 37.8123315 37.87040712 37.72927865 37.795913 37.8403643 37.8409452 37.332692 37.82669559 37.32673 37.8467842 37.333988 37.8588682 37.8465156 37.34774457 37.73985302 37.76634859 37.73857187 37.3477319 37.3509643 37.41 37.7692005 37.39 37.3229796 37.8238474 37.82489253 37.3302641 37.73172669 37.80021357 37.3364659 37.73016751 37.8262863 37.3236779 37.8381269 37.8388 37.339301 37.80874983 37.7837569 37.8036865 37.35288683 37.7927143 37.84959497 37.8401858 37.8396488 37.8426295 37.85578332 37.794396 37.35306166 37.8170154 37.77993 37.7851915 37.352601 37.8335577 37.3510173 37.7757452 37.3184498 37.781123 37.778058 37.42 37.78485466 37.38 37.78499973] unique count: 334 max range: 37.88022244590679 min range: 37.3172979 max frequncy: 3904 min frequncy: 1
descrip_column(df['end_station_latitude'])
name: end_station_latitude dtype: float64 null count: 0 unique: [37.794231 37.77588 37.78637527 37.77331088 37.7927143 37.79801364 37.329732 37.7567083 37.7592103 37.7787677 37.78317199 37.87367621 37.8658466 37.7745204 37.7730627 37.76328094 37.77379321 37.8126783 37.7630152 37.794396 37.7774157 37.8623199 37.8396488 37.80889393 37.795392 37.7737172 37.8628271 37.780526 37.7717933 37.75179165 37.7552126 37.7650259 37.7662102 37.7436839 37.760299 37.81231409 37.788975 37.78487208 37.7704074 37.7839879 37.776619 37.7518194 37.327581 37.77191688 37.7633158 37.8178269 37.78100972 37.79728 37.77779057 37.7509004 37.8660431 37.7591769 37.79539294 37.775946 37.8624644 37.8088479 37.7973195 37.78734902 37.8160598 37.7767539 37.75210498 37.77523487 37.866249 37.79857211 37.7693053 37.3259984 37.8098236 37.8723555 37.7524278 37.7479981 37.342725 37.7472996 37.8618037 37.750506 37.7735069 37.8335577 37.765052 37.7766392 37.7896254 37.7610471 37.82669559 37.73857187 37.86641794 37.78588063 37.7880593 37.8677892 37.335885 37.7510171 37.7440667 37.8693603 37.8524766 37.7700298 37.8107432 37.7765126 37.79647069 37.3229796 37.80274615 37.7632417 37.83329352 37.7544356 37.3336577 37.7662185 37.77341397 37.758862 37.80021357 37.79413 37.7700831 37.772406 37.8688126 37.77230063 37.78383 37.76918818 37.8127441 37.8090126 37.7605936 37.8053183 37.7807601 37.7801457 37.792251 37.78752178 37.80477 37.35288683 37.7616343 37.78543383 37.764555 37.34774457 37.8013189 37.77643482 37.8467842 37.76704458 37.8359455 37.76476522 37.73172669 37.8717192 37.34113204 37.7770527 37.8559558 37.77591022 37.75823842 37.78389936 37.776598 37.77143136 37.8332786 37.870139 37.873558 37.78045006 37.8505777 37.7809546 37.78716801 37.7824046 37.77483629 37.7664827 37.7810737 37.77874161 37.3361883 37.7719996 37.8409452 37.7614205 37.3364659 37.80000163 37.75728841 37.771058 37.7734919 37.79519476 37.85749021 37.78352084 37.849735 37.78729 37.88022245 37.8384435 37.78732677 37.86996671 37.333798 37.7692005 37.791464 37.8584732 37.7816495 37.8690599 37.8356322 37.3477319 37.8361823 37.8302232 37.7670785 37.76921786 37.8113514 37.34875869 37.78829998 37.795913 37.8465156 37.7644783 37.769757 37.87657255 37.76471009 37.8110807 37.77904666 37.4 37.7638471 37.77166246 37.7637085 37.7976728 37.8225475 37.3319323 37.3383952 37.7592005 37.789756 37.7478584 37.8401858 37.774814 37.7789994 37.7682646 37.7648022 37.77754677 37.8368013 37.82840997 37.80403678 37.8297046 37.87511169 37.7670373 37.7604469 37.84959497 37.33712237 37.8061628 37.8601246 37.8513755 37.8007544 37.7457388 37.8575672 37.77610091 37.75536713 37.795001 37.8277573 37.7874921 37.8087021 37.84052117 37.78127 37.8444927 37.77993 37.799953 37.764285 37.77176211 37.8357883 37.834174 37.7913 37.8426295 37.8388 37.84427875 37.87040712 37.848152 37.87055533 37.8072393 37.33203868 37.82489253 37.3327938 37.3510173 37.3322326 37.7668828 37.8233214 37.8312769 37.7423139 37.7896767 37.8170154 37.77865 37.75500026 37.80456235 37.8403643 37.78078713 37.8193814 37.7671004 37.8113768 37.3429973 37.3391456 37.80874983 37.8312752 37.32911867 37.78381271 37.3413348 37.81498823 37.333955 37.81607312 37.41 37.80318908 37.8419238 37.796248 37.332692 37.85022187 37.330698 37.86906048 37.8534894 37.8305452 37.7851915 37.3236779 37.8238474 37.8123315 37.336802 37.330165 37.85557366 37.8588682 37.35306166 37.3448821 37.8005161 37.80696976 37.85578332 37.8417999 37.8036865 37.80781318 37.79013985 37.337391 37.3302641 37.73985302 37.333988 37.8381269 37.7837569 37.3172979 37.339301 37.39 37.72927865 37.38 37.3324263 37.32673 37.331415 37.7757452 37.7752321 37.8262863 37.73016751 37.3509643 37.352601 37.32212463 37.778058 37.42 37.781123 37.76634859 37.78485466 37.3184498 37.43 37.78499973] unique count: 335 max range: 37.88022244590679 min range: 37.3172979 max frequncy: 4857 min frequncy: 1
symmetric_difference(df['start_station_latitude'].dropna(),df['end_station_latitude'].dropna())
array([37.43])
descrip_column(df['start_station_longitude'])
name: start_station_longitude dtype: float64 null count: 0 unique: [-122.400811 -122.391034 -122.4268256 -122.44654566 -122.27173805 -122.3911984 -121.884995 -122.40477026 -122.403452 -122.41327822 -122.4117382 -122.421025 -122.2674431 -122.2543374 -122.40566611 -122.4257277 -122.42624402 -122.40307085 -122.2687726 -122.4356498 -122.2653199 -122.2709501 -122.403234 -122.4116467 -122.2664467 -122.2730677 -122.38809792 -122.4219039 -122.4218661 -122.4209752 -122.410345 -122.2734217 -122.418892 -122.417385 -122.415674 -122.394203 -122.3908335 -122.3918648 -122.40086898 -122.4249863 -121.88566 -122.4377775 -122.4306746 -122.270582 -122.4213392 -122.398436 -122.2588044 -122.4113061 -122.39887 -122.412408 -122.402923 -122.39827931 -122.4159292 -122.258764 -122.2682473 -122.4326417 -122.4166511 -122.2782444 -122.26959229 -122.395282 -122.42228508 -122.40853071 -122.4390777 -122.3955263 -122.283093 -122.268422 -122.45370448 -121.877349 -122.2801923 -122.2675583 -122.4160402 -122.412544 -122.25379944 -122.41972357 -122.4310597 -122.2699271 -122.40844488 -122.4339496 -122.4337079 -122.26631463 -122.2496799 -122.4043639 -122.4154077 -122.418954 -122.3987734 -121.88321471 -122.28653312 -122.41962165 -122.2781754 -122.402717 -122.4130036 -122.2948365 -122.40891501 -122.2658964 -122.4201102 -121.8940901 -122.42123902 -122.4148171 -122.2515349 -122.4025701 -122.26848722 -122.398525 -122.4295585 -122.43455887 -122.40643188 -122.409851 -122.3949894 -122.2432677 -122.2902305 -122.4264353 -122.41357863 -122.4264968 -122.4114029 -122.4442926 -122.39974916 -122.2876102 -122.42193371 -122.397086 -121.8807965 -122.39274083 -122.42242321 -122.27057934 -122.39302754 -122.40737736 -122.29352832 -122.2893702 -122.26643801 -122.40595043 -121.886943 -122.4202187 -122.25190043 -122.40578681 -122.2525233 -122.42009103 -121.901782 -122.44729131 -122.4152042 -122.2651925 -122.4036725 -122.39443 -122.4368608 -122.2647911 -122.270556 -122.39997 -122.4117258 -122.40490437 -122.2880451 -122.2663179 -122.3906477 -122.2856333 -122.40764558 -122.3995794 -121.884559 -122.394643 -122.2748943 -122.26449609 -122.4266136 -122.26055324 -122.2597949 -122.40944937 -122.2914153 -121.9085859 -122.4291557 -122.3974371 -122.4206278 -122.4318042 -121.87712 -122.43115783 -122.2634901 -121.89479783 -122.39317 -122.390288 -122.258801 -122.26157784 -122.2782669 -122.26077855 -122.2532529 -122.39205122 -122.4154425 -121.94 -122.2894154 -122.40735859 -122.38879502 -122.4418376 -122.39357203 -122.3947713 -121.895617 -121.8892765 -122.4319464 -122.2702132 -122.2640037 -121.9125165 -122.42609382 -122.2516207 -122.2472152 -122.4369431 -122.45309293 -122.39843756 -122.39828467 -122.24237323 -122.2626418 -122.41995692 -122.2886647 -122.2567156 -122.399051 -122.26240933 -122.25646019 -122.4119009 -122.2871801 -121.89284384 -122.26017237 -122.41823 -122.3990176 -122.2629973 -122.40087569 -122.4231805 -122.4214722 -122.27396965 -122.4273169 -122.4274114 -121.8841054 -122.261351 -122.2244982 -122.2993708 -122.41685763 -122.2678864 -122.4052155 -122.3904285 -122.2521599 -122.43944585 -122.39438 -122.410662 -122.41874 -121.88176632 -122.4268059 -122.43327409 -121.9031829 -122.26356536 -122.39740491 -122.410807 -122.272968 -122.2535687 -122.2757325 -122.27972031 -122.2760402 -122.2739367 -121.8888891 -121.90457582 -121.886995 -122.26952791 -122.279352 -122.26938441 -122.26658821 -122.4266139 -121.9048882 -122.2623663 -122.2619284 -122.40257501 -121.8969655 -122.3899698 -122.28105068 -121.885831 -122.25622416 -121.888979 -122.2756976 -121.89034939 -122.2720799 -121.8932 -121.8810904 -121.8759263 -122.27484405 -122.42214024 -122.29339957 -122.2851712 -122.29967594 -122.39289612 -122.255547 -122.2644881 -122.2913604 -121.900084 -122.27179706 -121.8892731 -122.2913761 -121.894902 -122.2912095 -122.2653043 -121.8908 -122.38565549 -122.39629179 -122.38961779 -121.899464 -121.9020161 -121.95 -122.4338119 -121.93 -121.8879312 -122.2811926 -122.26043655 -121.8977018 -122.39005566 -122.25381017 -121.8766132 -122.39896327 -122.2651002 -121.8741186 -122.2512714 -122.258732 -121.889937 -122.28328228 -121.92 -122.2226033 -122.282497 -121.88604981 -122.2487796 -122.26556897 -122.2618225 -122.2717561 -122.267738 -122.28312671 -122.253842 -121.89193726 -122.2717615 -122.2177284 -121.96 -122.2343822 -121.905733 -122.2674183 -121.8959209 -122.2130372 -121.8831724 -122.2329915 -122.2254 -122.23930478 -121.98 -122.39593562] unique count: 335 max range: -121.8741186 min range: -122.45370447635652 max frequncy: 3904 min frequncy: 1
descrip_column(df['end_station_longitude'])
name: end_station_longitude dtype: float64 null count: 0 unique: [-122.402923 -122.39317 -122.40490437 -122.4442926 -122.2487796 -122.40595043 -121.901782 -122.421025 -122.4213392 -122.4159292 -122.39357203 -122.26848722 -122.2674431 -122.40944937 -122.4390777 -122.40737736 -122.42123902 -122.2687726 -122.4264968 -122.253842 -122.4418376 -122.258801 -122.2717561 -122.25646019 -122.394203 -122.4116467 -122.2902305 -122.390288 -122.4337079 -122.4052155 -122.4209752 -122.3987734 -122.4266136 -122.4268059 -122.418892 -122.26077855 -122.403452 -122.40087569 -122.3911984 -122.412408 -122.417385 -122.4266139 -121.884559 -122.45370448 -122.4219039 -122.2756976 -122.40566611 -122.398436 -122.40643188 -122.4274114 -122.2588044 -122.4369431 -122.40477026 -122.4377775 -122.2647911 -122.2496799 -122.2653199 -122.4166511 -122.2782444 -122.3990176 -122.41972357 -122.3974371 -122.2993708 -122.40086898 -122.4268256 -121.87712 -122.2801923 -122.2664467 -122.4206278 -122.4202187 -121.895617 -122.4114029 -122.2535687 -122.4339496 -122.4160402 -122.2674183 -122.4218661 -122.3955263 -122.400811 -122.4326417 -122.27179706 -122.38961779 -122.25379944 -122.40891501 -122.3918648 -122.2658964 -121.88566 -122.4119009 -122.4214722 -122.2543374 -122.2702132 -122.4117258 -122.2914153 -122.4113061 -122.41685763 -121.8879312 -122.41357863 -122.4306746 -122.25622416 -122.4043639 -121.9085859 -122.4310597 -122.4273169 -122.412544 -122.25381017 -122.39443 -122.4291557 -122.4356498 -122.258764 -122.39302754 -122.39887 -122.42228508 -122.2472152 -122.2682473 -122.4148171 -122.2948365 -122.3949894 -122.40307085 -122.397086 -122.39740491 -122.403234 -121.88604981 -122.3906477 -122.41962165 -122.410345 -121.8908 -122.2626418 -122.42624402 -122.2913761 -122.3908335 -122.2623663 -122.42009103 -122.39005566 -122.2730677 -121.89284384 -122.4295585 -122.2597949 -122.40257501 -122.42609382 -122.40844488 -122.395282 -122.40578681 -122.2634901 -122.268422 -122.283093 -122.4319464 -122.2781754 -122.39974916 -122.38809792 -122.43944585 -122.44654566 -122.39827931 -122.4117382 -122.39274083 -121.8892765 -122.3899698 -122.2913604 -122.4264353 -121.8766132 -122.26643801 -122.39205122 -122.402717 -122.4036725 -122.27396965 -122.26157784 -122.43115783 -122.270582 -122.39438 -122.26959229 -122.2886647 -122.41327822 -122.28653312 -121.886943 -122.4338119 -122.391034 -122.2532529 -122.4154077 -122.270556 -122.28105068 -121.899464 -122.2871801 -122.2709501 -122.40735859 -122.40764558 -122.2734217 -121.89479783 -122.40853071 -122.255547 -122.2653043 -122.4025701 -122.415674 -122.26952791 -122.41995692 -122.2432677 -122.44729131 -121.93 -122.4130036 -122.42242321 -122.4152042 -122.2629973 -122.2663179 -121.9048882 -121.8807965 -122.409851 -122.394643 -122.4249863 -122.2618225 -122.418954 -122.4368608 -122.4201102 -122.3947713 -122.43327409 -122.2640037 -122.26631463 -122.26240933 -122.2876102 -122.26055324 -122.4154425 -122.410807 -122.26556897 -121.88321471 -122.2760402 -122.26938441 -122.2525233 -122.2748943 -122.42214024 -122.2675583 -122.45309293 -122.38879502 -122.39997 -122.2567156 -122.39828467 -122.2699271 -122.29352832 -122.41874 -122.261351 -122.2177284 -122.398525 -122.4318042 -122.39843756 -122.2516207 -122.272968 -122.399051 -122.267738 -122.258732 -122.25190043 -122.29967594 -122.2521599 -122.27972031 -122.2893702 -121.88176632 -122.26043655 -121.8759263 -121.8959209 -121.9125165 -122.3995794 -122.2757325 -122.2782669 -122.4231805 -122.3904285 -122.2717615 -122.41823 -122.4257277 -122.27173805 -122.2644881 -122.42193371 -122.2619284 -122.410662 -122.2651925 -121.8888891 -121.8841054 -122.28328228 -122.2856333 -121.90457582 -122.43455887 -121.9031829 -122.27484405 -121.877349 -122.2678864 -121.96 -122.27057934 -122.2880451 -122.279352 -121.900084 -122.26017237 -121.888979 -122.29339957 -122.2894154 -122.2739367 -122.2343822 -121.8741186 -122.2811926 -122.2851712 -121.8940901 -121.885831 -122.26356536 -122.2912095 -121.89193726 -121.8969655 -122.2720799 -122.26658821 -122.28312671 -122.2515349 -122.282497 -122.26449609 -122.24237323 -121.886995 -121.8977018 -121.92 -122.38565549 -121.894902 -122.2512714 -122.2226033 -121.884995 -121.889937 -122.39289612 -121.89034939 -121.8892731 -121.8932 -122.2130372 -122.2244982 -121.95 -121.94 -122.2651002 -122.39896327 -121.9020161 -121.905733 -121.8810904 -122.2254 -122.2329915 -122.39629179 -122.23930478 -121.8831724 -121.98 -122.39593562] unique count: 335 max range: -121.8741186 min range: -122.45370447635652 max frequncy: 4857 min frequncy: 2
symmetric_difference(df['start_station_longitude'].dropna(),df['end_station_longitude'].dropna())
array([], dtype=float64)
descrip_column(df['bike_id'])
name: bike_id dtype: int64 null count: 0 unique: [4902 2535 5905 ... 4208 3655 5067] unique count: 4646 max range: 6645 min range: 11 max frequncy: 191 min frequncy: 1
descrip_column(df['user_type'])
name: user_type dtype: object null count: 0 unique: ['Customer' 'Subscriber'] unique count: 2 max frequncy: 163544 min frequncy: 19868
user_type variable type is categorical nominal either Customer or Subscriber.descrip_column(df['member_birth_year'])
name: member_birth_year dtype: float64 null count: 8265 unique: [1984. nan 1972. 1989. 1974. 1959. 1983. 1988. 1992. 1996. 1993. 1990. 1981. 1975. 1978. 1991. 1997. 1986. 2000. 1982. 1995. 1980. 1973. 1985. 1971. 1979. 1967. 1998. 1994. 1977. 1999. 1987. 1969. 1963. 1976. 1964. 1965. 1961. 1968. 1966. 1962. 1954. 1958. 1960. 1970. 1956. 1957. 1945. 1900. 1952. 1948. 1951. 1941. 1950. 1949. 1953. 1955. 1946. 1947. 1931. 1943. 1942. 1920. 1933. 2001. 1878. 1901. 1944. 1928. 1934. 1939. 1930. 1902. 1910. 1938. 1927.] unique count: 75 max range: 2001.0 min range: 1878.0 max frequncy: 10236 min frequncy: 1
float which is not appropriate for describing age. we should convert it to intdescrip_column(df['member_gender'])
name: member_gender dtype: object null count: 8265 unique: ['Male' nan 'Other' 'Female'] unique count: 3 max frequncy: 130651 min frequncy: 3652
member_gender variable type is categorical nominal either Male, Female or Other.descrip_column(df['bike_share_for_all_trip'])
name: bike_share_for_all_trip dtype: object null count: 0 unique: ['No' 'Yes'] unique count: 2 max frequncy: 166053 min frequncy: 17359
bike_share_for_all_trip variable type is categorical nominal either No or Yes.first we gonna extract member age from member_birth_year feature and store as new column member_age
df['member_age'] = 2019 - df['member_birth_year']
now we gonna segregate start_time column into years, monthes, weekdays, days and hours, neglecting minutes and seconds as it will not really affect our analysis.
datetime = pd.to_datetime(df['start_time'])
df['start_year'] = datetime.dt.year
df['start_month'] = datetime.dt.month
df['start_weekday'] = datetime.dt.strftime('%A')
df['start_day'] = datetime.dt.day
df['start_hour'] = datetime.dt.hour
now we gonna segregate end_time column into years, monthes, weekdays, days and hours, neglecting minutes and seconds as it will not really affect our analysis.
datetime = pd.to_datetime(df['end_time'])
df['end_year'] = datetime.dt.year
df['end_month'] = datetime.dt.month
df['end_weekday'] = datetime.dt.strftime('%A')
df['end_day'] = datetime.dt.day
df['end_hour'] = datetime.dt.hour
since we don't have missing values in latitude/longitude columns we can replace missing values in station name/id with a made up ones.
temp1 = df[['end_station_latitude','end_station_longitude']][df['end_station_id'].isnull()]
temp2 = df[['start_station_latitude','start_station_longitude']][df['start_station_id'].isnull()]
temp1 = temp1.rename(columns={'end_station_latitude':'latitude','end_station_longitude':'longitude'})
temp2 = temp2.rename(columns={'start_station_latitude':'latitude','start_station_longitude':'longitude'})
union = pd.concat([temp1, temp2], ignore_index=True,axis=0).drop_duplicates()
union['id'] = np.arange(union.shape[0]) + 400
union['name'] = [f'UNKNOWN LOCATION {i}' for i in range(union.shape[0])]
for i in range(union.shape[0]):
mask = (df[['end_station_latitude','end_station_longitude']].values == union[['latitude','longitude']].values[i]).all(axis=1)
df.loc[mask,['end_station_id','end_station_name']] = union[['id','name']].values[i]
mask = (df[['start_station_latitude','start_station_longitude']].values == union[['latitude','longitude']].values[i]).all(axis=1)
df.loc[mask,['start_station_id','start_station_name']] = union[['id','name']].values[i]
df[df['end_station_id'] >= 400]
| duration_sec | start_time | end_time | start_station_id | start_station_name | start_station_latitude | start_station_longitude | end_station_id | end_station_name | end_station_latitude | ... | start_year | start_month | start_weekday | start_day | start_hour | end_year | end_month | end_weekday | end_day | end_hour | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 475 | 1709 | 2019-02-28 20:55:53.9320 | 2019-02-28 21:24:23.7380 | 406.0 | UNKNOWN LOCATION 6 | 37.40 | -121.94 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | ... | 2019 | 2 | Thursday | 28 | 20 | 2019 | 2 | Thursday | 28 | 21 |
| 1733 | 1272 | 2019-02-28 18:32:34.2730 | 2019-02-28 18:53:46.7270 | 406.0 | UNKNOWN LOCATION 6 | 37.40 | -121.94 | 401.0 | UNKNOWN LOCATION 1 | 37.41 | ... | 2019 | 2 | Thursday | 28 | 18 | 2019 | 2 | Thursday | 28 | 18 |
| 3625 | 142 | 2019-02-28 17:10:46.5290 | 2019-02-28 17:13:09.4310 | 405.0 | UNKNOWN LOCATION 5 | 37.41 | -121.95 | 401.0 | UNKNOWN LOCATION 1 | 37.41 | ... | 2019 | 2 | Thursday | 28 | 17 | 2019 | 2 | Thursday | 28 | 17 |
| 4070 | 585 | 2019-02-28 16:28:45.9340 | 2019-02-28 16:38:31.3320 | 403.0 | UNKNOWN LOCATION 3 | 37.39 | -121.93 | 402.0 | UNKNOWN LOCATION 2 | 37.40 | ... | 2019 | 2 | Thursday | 28 | 16 | 2019 | 2 | Thursday | 28 | 16 |
| 5654 | 509 | 2019-02-28 12:30:17.1310 | 2019-02-28 12:38:46.3290 | 402.0 | UNKNOWN LOCATION 2 | 37.40 | -121.92 | 403.0 | UNKNOWN LOCATION 3 | 37.39 | ... | 2019 | 2 | Thursday | 28 | 12 | 2019 | 2 | Thursday | 28 | 12 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 176154 | 1447 | 2019-02-02 12:03:04.5440 | 2019-02-02 12:27:12.2670 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | -121.93 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | ... | 2019 | 2 | Saturday | 2 | 12 | 2019 | 2 | Saturday | 2 | 12 |
| 179730 | 309 | 2019-02-01 12:59:45.9690 | 2019-02-01 13:04:55.4260 | 406.0 | UNKNOWN LOCATION 6 | 37.40 | -121.94 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | ... | 2019 | 2 | Friday | 1 | 12 | 2019 | 2 | Friday | 1 | 13 |
| 179970 | 659 | 2019-02-01 12:17:37.6750 | 2019-02-01 12:28:37.0140 | 401.0 | UNKNOWN LOCATION 1 | 37.41 | -121.96 | 407.0 | UNKNOWN LOCATION 7 | 37.41 | ... | 2019 | 2 | Friday | 1 | 12 | 2019 | 2 | Friday | 1 | 12 |
| 180106 | 2013 | 2019-02-01 11:33:55.1470 | 2019-02-01 12:07:28.9400 | 406.0 | UNKNOWN LOCATION 6 | 37.40 | -121.94 | 406.0 | UNKNOWN LOCATION 6 | 37.40 | ... | 2019 | 2 | Friday | 1 | 11 | 2019 | 2 | Friday | 1 | 12 |
| 181201 | 312 | 2019-02-01 09:26:34.8030 | 2019-02-01 09:31:46.9210 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | -121.93 | 400.0 | UNKNOWN LOCATION 0 | 37.40 | ... | 2019 | 2 | Friday | 1 | 9 | 2019 | 2 | Friday | 1 | 9 |
197 rows × 27 columns
now we gonna drop start_time, end_time and member_birth_year as they no longer needed.
df.drop(columns=['start_time','end_time','member_birth_year'],inplace=True)
as we can not determaine the member gender for the missing values using other features we gonna drop these rows.
df = df[~df['member_gender'].isnull()]
finally we change the float data type for start_station_id, end_station_id and member_age columns to more suitable data type int.
columns = ['start_station_id','end_station_id','member_age']
df[columns] = df[columns].astype(np.int64)
we gonna skip this step as no need for Enriching at the time.
we gonna skip this step as no need for validation.
now that everything is set we are ready to save our data.
df.to_csv('data/fordgobike_tripdata_clean.csv', index=False)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 175147 entries, 0 to 183411 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 duration_sec 175147 non-null int64 1 start_station_id 175147 non-null int64 2 start_station_name 175147 non-null object 3 start_station_latitude 175147 non-null float64 4 start_station_longitude 175147 non-null float64 5 end_station_id 175147 non-null int64 6 end_station_name 175147 non-null object 7 end_station_latitude 175147 non-null float64 8 end_station_longitude 175147 non-null float64 9 bike_id 175147 non-null int64 10 user_type 175147 non-null object 11 member_gender 175147 non-null object 12 bike_share_for_all_trip 175147 non-null object 13 member_age 175147 non-null int64 14 start_year 175147 non-null int64 15 start_month 175147 non-null int64 16 start_weekday 175147 non-null object 17 start_day 175147 non-null int64 18 start_hour 175147 non-null int64 19 end_year 175147 non-null int64 20 end_month 175147 non-null int64 21 end_weekday 175147 non-null object 22 end_day 175147 non-null int64 23 end_hour 175147 non-null int64 dtypes: float64(4), int64(13), object(7) memory usage: 33.4+ MB
plt_hist_uni(df['duration_sec'],xlim=(0,4000),bins=300)
(df['duration_sec'] <= 60*60).mean() * 100
99.20638092573667
(df['duration_sec'] <= 30*60).mean() * 100
96.55432294015884
bar_plot_uni(df['user_type'],figsize=(14,6))
pie_plot(df['user_type'])
bar_plot_uni(df['member_gender'],figsize=(14,6))
pie_plot(df['member_gender'])
bar_plot_uni(df['bike_share_for_all_trip'],figsize=(14,6))
pie_plot(df['bike_share_for_all_trip'])
bar_plot_uni(df['member_age'],figsize=(14,6),percentages=False)
bar_plot_uni(df['start_year'],figsize=(14,6))
bar_plot_uni(df['end_year'],figsize=(14,6))
bar_plot_uni(df['start_month'],figsize=(14,6))
bar_plot_uni(df['end_month'],figsize=(14,6))
order = ['Saturday', 'Sunday','Monday','Tuesday','Wednesday','Thursday','Friday']
bar_plot_uni(df['start_weekday'],figsize=(14,6),order=order)
bar_plot_uni(df['end_weekday'],figsize=(14,6),order=order)
start_weekday and end_weekday are almost the same since 99.2% of the trips are one hour or less long.bar_plot_uni(df['start_day'],figsize=(14,6))
bar_plot_uni(df['end_day'],figsize=(14,6))
start_day and end_day are almost the same since 99.2% of the trips are one hour or less long.bar_plot_uni(df['start_hour'],figsize=(14,6))
bar_plot_uni(df['end_hour'],figsize=(14,6))
start_hour and end_hour are almost the same since 99.2% of the trips are one hour or less long.bar_plot_uni(df['start_station_name'],figsize=(6,60),vert=True,percentages=False,order='total')
bar_plot_uni(df['end_station_name'],figsize=(6,60),vert=True,percentages=False,order='total')
symmetric_difference(df['end_station_name'].value_counts()[:11].index,df['start_station_name'].value_counts()[:11].index)
array([], dtype=float64)
df.info()
<class 'pandas.core.frame.DataFrame'> Int64Index: 175147 entries, 0 to 183411 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 duration_sec 175147 non-null int64 1 start_station_id 175147 non-null int64 2 start_station_name 175147 non-null object 3 start_station_latitude 175147 non-null float64 4 start_station_longitude 175147 non-null float64 5 end_station_id 175147 non-null int64 6 end_station_name 175147 non-null object 7 end_station_latitude 175147 non-null float64 8 end_station_longitude 175147 non-null float64 9 bike_id 175147 non-null int64 10 user_type 175147 non-null object 11 member_gender 175147 non-null object 12 bike_share_for_all_trip 175147 non-null object 13 member_age 175147 non-null int64 14 start_year 175147 non-null int64 15 start_month 175147 non-null int64 16 start_weekday 175147 non-null object 17 start_day 175147 non-null int64 18 start_hour 175147 non-null int64 19 end_year 175147 non-null int64 20 end_month 175147 non-null int64 21 end_weekday 175147 non-null object 22 end_day 175147 non-null int64 23 end_hour 175147 non-null int64 dtypes: float64(4), int64(13), object(7) memory usage: 33.4+ MB
box_plot(df,'user_type','duration_sec',ylim=(0,2500))
plt_hist(df['user_type'],df['duration_sec'],bins=2000,scale='linear',xlim=(61,4000),histtype='step')
box_plot(df,'member_gender','duration_sec',ylim=(0,2000),showfliers=False)
plt_hist(df['member_gender'],df['duration_sec'],bins=2000,scale='linear',xlim=(61,4000),histtype='step')
box_plot(df,'bike_share_for_all_trip','duration_sec',ylim=(0,2000),showfliers=False)
plt_hist(df['bike_share_for_all_trip'],df['duration_sec'],bins=2000,scale='linear',xlim=(61,4000),histtype='step')
plt_hist(df['bike_share_for_all_trip'],df['duration_sec'],bins=300,scale='log',xlim=(61,4000),histtype='step')
order = ['Saturday', 'Sunday','Monday','Tuesday','Wednesday','Thursday','Friday']
box_plot(df,'start_weekday','duration_sec',ylim=(0,2000),showfliers=False,order = order)
plt_hist(df['start_weekday'],df['duration_sec'],bins=2000,scale='linear',xlim=(61,4000),histtype='step')
box_plot(df,'start_hour','duration_sec',ylim=(0,2000),showfliers=False)
order = ['Saturday', 'Sunday','Monday','Tuesday','Wednesday','Thursday','Friday']
box_plot_multi(df,'start_weekday','duration_sec','user_type',ylim=(0,3500),showfliers=False,order=order)
box_plot_multi(df,'start_hour','duration_sec','user_type',ylim=(0,3500),showfliers=False)
order = ['Saturday', 'Sunday','Monday','Tuesday','Wednesday','Thursday','Friday']
box_plot_multi(df,'start_weekday','duration_sec','bike_share_for_all_trip',ylim=(0,2000),showfliers=False,order=order)
box_plot_multi(df,'start_hour','duration_sec','bike_share_for_all_trip',ylim=(0,2500),showfliers=False)
order = ['Saturday', 'Sunday','Monday','Tuesday','Wednesday','Thursday','Friday']
box_plot_multi(df,'start_weekday','duration_sec','member_gender',ylim=(0,2500),showfliers=False,order=order)
box_plot_multi(df,'start_hour','duration_sec','member_gender',ylim=(0,2500),showfliers=False)